REPORT  DOCUMENTATION  PAGE 


Form  Approved  OMB  NO.  0704-0188 


The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions, 
searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments 
regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggesstions  for  reducing  this  burden,  to  Washington 
Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington  VA,  22202-4302. 
Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any  oenalty  for  failing  to  comply  with  a  collection 
of  information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


1.  REPORT  DATE  (DD-MM-YYYY) 


2.  REPORT  TYPE 

New  Reprint 


4.  TITLE  AND  SUBTITLE 

Discriminative  and  Compact  Dictionary  Design  for 
Hyperspectral  Image  Classification  using  Learning  VQ 
Framework 


6.  AUTHORS 

Zhaowen  Wang,  Nasser  Nasrabadi,  Thomas  Huang 


3.  DATES  COVERED  (From  -  To) 


5a.  CONTRACT  NUMBER 
W91  INF-09-1-0383 


5b.  GRANT  NUMBER 


5c.  PROGRAM  ELEMENT  NUMBER 
611103 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAMES  AND  ADDRESSES 

William  Marsh  Rice  University 
6100  Main  St.,  MS-16 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


Houston,  TX  77005  -1827 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS 
(ES) 

U.S.  Army  Research  Office 
P.O.Box  12211 

Research  Triangle  Park,  NC  27709-2211 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 
ARO 


11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

56177-CS-MUR.162 


12.  DISTRIBUTION  AVA1LIBILITY  STATEMENT 
Approved  for  public  release;  distribution  is  unlimited. 


13.  SUPPLEMENTARY  NOTES 

The  views,  opinions  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  contrued  as  an  official  Department 
of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other  documentation. 


14.  ABSTRACT 

Sparse  representation  provides  an  efficient  description  for 
high-dimensional  Hyperspectral  Imagery  (HSI)  and  also  encodes 
discriminative  information  useful  for  classification. 

However,  due  to  the  large  size  of  typical  HSI  images,  the 
naive  way  to  construct  a  dictionary  with  all  training  pixels 


15.  SUBJECT  TERMS 

sparse  representation,  learning  vectorquantization,  hyperspectral  image  classification 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

15.  NUMBER 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

ABSTRACT 

OF  PAGES 

UU 

UU 

UU 

UU 

Richard  Baraniuk 


19b.  TELEPHONE  NUMBER 
713-348-5132 


Standard  Fonn  298  (Rev  8/98) 
Prescribed  by  ANSI  Std.  Z39.18 


Report  Title 

Discriminative  and  Compact  Dictionary  Design  for  Hyperspectral  Image  Classification  using  Learning  VQ 
Framework 

ABSTRACT 

Sparse  representation  provides  an  efficient  description  for 
high-dimensional  Hyperspectral  Imagery  (HSI)  and  also  encodes 
discriminative  information  useful  for  classification. 

However,  due  to  the  large  size  of  typical  HSI  images,  the 
naive  way  to  construct  a  dictionary  with  all  training  pixels 
is  neither  efficient  nor  practical.  In  this  paper,  a  novel 
approach  is  proposed  to  design  compact  dictionary  for  Sparse 
Representation-based  Classification  (SRC).  Inspired 
by  Learning  Vector  Quantization  (LVQ)  techniques,  we  use  a 
hinge  loss  function  directly  related  to  classification  task  as  our 
objective  function,  and  optimize  the  dictionary  by  exploiting 
the  differentiable  parts  of  sparse  codes.  The  resultant  dictionary 
updating  procedure  adapts  the  “push”  and  “pull”  actions 
in  LVQ  to  SRC,  which  is  therefore  named  as  Learning  Sparse 
Representation-based  Classification  (LSRC).  Experiments  on 
different  HSI  images  demonstrate  that  our  LSRC  approach 
can  achieve  higher  classification  accuracy  with  substantially 
smaller  dictionary  size  than  using  the  whole  training  set,  and 
also  outperforms  existing  dictionary  learning  methods. 


REPORT  DOCUMENTATION  PAGE  (SF298) 
(Continuation  Sheet) 

Continuation  for  Block  13 


ARO  Report  Number  56177. 162-CS-MUR 
Discriminative  and  Compact  Dictionary  Design  f... 


Block  13:  Supplementary  Note 

©2013  .  Published  in  IEEE  International  Conference  on  Acoustic,  Speech  and  Signal  Processing  (ICASSP),  2013,  Vol.  Ed.  0 
(2013),  (Ed.  ).  DoD  Components  reserve  a  royalty-free,  nonexclusive  and  irrevocable  right  to  reproduce,  publish,  or  otherwise 
use  the  work  for  Federal  purposes,  and  to  authroize  others  to  do  so  (DODGARS  §32.36).  The  views,  opinions  and/or  findings 
contained  in  this  report  are  those  of  the  author(s)  and  should  not  be  construed  as  an  official  Department  of  the  Army  position, 
policy  or  decision,  unless  so  designated  by  other  documentation. 

Approved  for  public  release;  distribution  is  unlimited. 
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ABSTRACT 

Sparse  representation  provides  an  efficient  description  for 
high-dimensional  Hyperspectral  Imagery  (HSI)  and  also  en¬ 
codes  discriminative  information  useful  for  classification. 
However,  due  to  the  large  size  of  typical  HSI  images,  the 
naive  way  to  construct  a  dictionary  with  all  training  pix¬ 
els  is  neither  efficient  nor  practical.  In  this  paper,  a  novel 
approach  is  proposed  to  design  compact  dictionary  for  S- 
parse  Representation-based  Classification  (SRC).  Inspired 
by  Learning  Vector  Quantization  (LVQ)  techniques,  we  use  a 
hinge  loss  function  directly  related  to  classification  task  as  our 
objective  function,  and  optimize  the  dictionary  by  exploiting 
the  differentiable  parts  of  sparse  codes.  The  resultant  dictio¬ 
nary  updating  procedure  adapts  the  “push”  and  “pull”  actions 
in  LVQ  to  SRC,  which  is  therefore  named  as  Learning  Sparse 
Representation-based  Classification  (LSRC).  Experiments  on 
different  HSI  images  demonstrate  that  our  LSRC  approach 
can  achieve  higher  classification  accuracy  with  substantially 
smaller  dictionary  size  than  using  the  whole  training  set,  and 
also  outperforms  existing  dictionary  learning  methods. 

Index  Terms —  sparse  representation,  learning  vector 
quantization,  hyperspectral  image  classification 

1.  INTRODUCTION 

Hyperspectral  Imagery  (HSI)  is  an  important  tool  in  remote 
sensing  which  can  measure  distinct  spectral  signatures  for 
different  ground  materials,  and  it  is  widely  applied  in  agri¬ 
culture,  military,  mineralogy,  etc.  Different  approaches  have 
been  used  to  classify  HSI  data;  successful  examples  include 
Support  Vector  Machine  (SVM)  [1]  and  its  variations  [2,  3], 

More  recently,  Sparse  Representation-based  Classifica¬ 
tion  (SRC)  [4]  has  also  been  applied  to  HSI  classification, 
and  achieves  competitive  results  [5].  Sparse  representation 
expresses  a  signal  as  the  linear  combination  of  very  few  atom- 
s  from  an  over-complete  dictionary,  and  the  resulting  sparse 
code  can  reveal  its  class  information  if  signals  from  different 
classes  lie  in  different  subspaces.  The  effectiveness  of  SRC 
has  already  been  proven  in  face  recognition  [4],  expression 
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recognition  [6],  and  speaker  verification  [7].  Good  perfor¬ 
mance  on  HSI  classification  is  also  expected  because  the  high 
correlation  among  different  channels  of  HSI  image  intrinsi¬ 
cally  induces  a  low  dimensional  subspace  in  which  samples 
can  sparsely  be  represented. 

A  good  dictionary  characterizing  the  subspace  structure  of 
each  class  is  the  key  for  SRC  to  attain  high  classification  accu¬ 
racy.  Conventionally,  SRC  dictionary  is  constructed  by  direct¬ 
ly  combining  all  the  training  samples  [4,  5],  which  is  neither 
efficient  nor  practical  for  HSI  data  with  huge  number  of  data 
samples.  Random  sampling  or  clustering  methods  can  give 
compact  dictionaries,  but  generative  as  well  as  discriminative 
capabilities  are  lost  in  such  sub-optimal  dictionaries.  There 
has  been  a  hot  trend  lately  in  computer  vision  and  machine 
learning  communities  trying  to  learn  condensed  dictionaries 
well  fitted  to  large  scale  training  data.  Generative  approach¬ 
es,  such  as  Method  of  Optimal  Direction  (MOD)  [8],  K-SVD 
[9,  10],  and  the  relaxed  l\  formulations  [11,  12],  have  focused 
on  minimizing  signal  reconstruction  errors.  For  better  perfor¬ 
mance  on  classification,  discrimination  costs  have  also  been 
incorporated  in  a  supervised  manner  [13,  14,  15],  and  clas¬ 
sification  models  other  than  SRC  have  been  used  with  sparse 
codes  as  inputs  [16,  17,  18.  19,  20,  21].  However,  the  discrim¬ 
ination  metrics  used  in  existing  methods  are  not  geared  to  the 
mechanism  of  SRC.  and  the  employment  of  an  extra  classi¬ 
fication  model  leads  to  more  parameters  which  increase  the 
risk  of  over-fitting  and  break  the  unified  framework  of  SRC. 

In  this  paper,  a  new  dictionary  learning  algorithm  is  pro¬ 
posed  particularly  for  the  purpose  of  classification  with  SRC. 
We  optimize  the  dictionary  by  minimizing  the  hinge  loss 
of  residual  difference  between  competing  classes,  which  is 
inspired  by  the  idea  behind  Learning  Vector  Quantization 
(LVQ)  [22].  LVQ  techniques  were  first  applied  to  dictionary 
learning  by  Chen  et  al.  [23]  in  an  ad-hoc  way;  while  here 
we  adapt  the  philosophy  of  LVQ  to  SRC  in  a  more  principled 
manner  (as  formulated  in  Section  2),  and  hence  name  the 
algorithm  as  Learning  Sparse  Representation-based  Classifi¬ 
cation  (LSRC).  Stochastic  gradient  descent  is  used  in  LSRC 
to  circumvent  the  non-differentiable  part  of  sparse  code,  and 
leads  to  updating  rules  (derived  in  Section  3)  mimicking  the 
“push"  and  “pull”  actions  of  LVQ.  Superior  classification 
results  are  achieved  using  the  proposed  LSRC  algorithm  on 
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several  HS1  images  (reported  in  the  experiments  in  Section 
4).  We  also  discuss  our  contributions  related  to  prior  works 
(in  Section  5)  and  draw  concluding  remarks  (in  Section  6). 

2.  PROBLEM  FORMULATION 

2.1.  Sparse  Representation-based  Classification 

Suppose  we  have  a  data  set  containing  N  labeled  HSI  pix¬ 
els  of  ra  channels  coming  from  C  classes:  {xi,yi}i=i...N, 
Xj  6  Rm,  x/i  e  {1, C'}.  A  dictionary  D  6  Rmxra  of  size  n 
used  in  SRC  [4J  is  composed  of  C  class-wise  sub-dictionaries 
Dc  6  Rmx®  such  that  D  =  [D1,  ...,DC'].  The  sparse  code 
on  6  Rn  for  pixel  x,  can  be  recovered  by  solving  the  follow¬ 
ing  l\  regularized  problem  as  in  compressive  sensing  [24]: 


LVQ  shares  a  common  spirit  with  the  SRC  in  several 
ways.  Both  of  them  represent  data  samples  with  a  subset  of 
elements  in  codebook  or  dictionary,  and  classify  the  samples 
based  on  the  energy  distribution  in  the  selected  prototypes  or 
atoms.  This  justifies  the  attempt  in  [23]  to  use  updating  rules 
similar  to  Eq.  (4)  in  learning  dictionary  for  SRC.  However, 
the  underlying  principles  of  sparse  coding  and  vector  quan¬ 
tization  are  quite  different,  which  makes  the  performance  of 
the  ad-hoc  approach  in  [23]  unguaranteed. 

A  deeper  insight  into  LVQ  has  been  developed  in  [25] 
which  regards  the  learning  procedure  as  a  scholastic  gradient 
descent  algorithm  with  a  loss  function  defined  on  any  mis- 
classified  sample  x; : 

^LVQ^uVi)  °c  ||xj  -  m+(i)||2  -  ||x;  -  m“(i)||2,  (5) 


a;  =  argmin  | |Dz  —  Xi||2  +  A|z|i,  with  A  >  0.  (1) 

Z 

The  sparse  code  can  be  decomposed  into  C  sub-codes  in  a 
similar  way:  aj  =  [oA; ...;  af] .  SRC  makes  classification 
decision  based  on  the  residual  of  signal  approximated  by  sub¬ 
code  of  each  class:  rf  =  ||ejj|2,  where  e?  =  x.,  —  D caf  is 
the  class-wise  reconstruction  error.  The  predicted  class  label 
is  obtained  as 

iji  =  arg  min  r?.  (2) 

C 

Generally,  our  goal  is  to  find  an  optimal  dictionary  D* 
that  achieves  the  best  classification  on  the  data  set: 

D*  =  arg  min  ^  V  I  (: Vi  +  Vi),  (3) 

i 

where  /(•)  is  the  indicator  function,  and  V  is  the  matrix  space 
with  unit-length  columns. 

2.2.  Objective  Function  with  Insight  from  LVQ 

Although  closely  related  to  our  task  of  classification,  Eq.  (3) 
cannot  be  solved  directly.  A  recent  work  in  [23]  applied  the 
LVQ  technique  to  learn  the  dictionary  for  SRC,  which  mo¬ 
tivated  us  to  design  a  more  appropriative  objective  function 
based  on  the  insight  from  LVQ. 

LVQ  [22]  is  a  supervised  learning  algorithm  which  gen¬ 
erates  a  codebook  optimized  for  a  prototype-based  classifi¬ 
er.  In  testing,  LVQ  classifies  a  sample  with  the  same  label  as 
the  closest  prototype  in  the  codebook  to  it,  which  is  essen¬ 
tially  the  same  as  the  nearest  neighbor  classification.  Dur¬ 
ing  training,  LVQ  (in  its  simplest  version)  iteratively  goes 
through  each  training  sample  x,  and  moves  its  nearest  proto¬ 
type  mn(,;j  towards  or  away  from  x.t  based  on  whether  m^) 
belongs  to  the  same  class  as  x,( : 

m  f  mn(i)  +  p(xi  -  m„(i)),  if  m„(i)  has  label  yt 

”W  \  mn(i)  -  p(x*  -  m„(i)),  otherwise 

(4) 

where  0  <  p  <  1  is  a  monotonically  decreasing  step  size. 


where  m^..  and  are  the  nearest  prototypes  to  x,  with 

label  y.i  and  other  than  yt,  respectively.  We  adopt  an  objec¬ 
tive  function  with  a  similar  form  as  in  Eq.  (5)  with  the  hope 
that  the  merits  of  LVQ  can  be  exploited  in  building  an  SRC 
dictionary.  Specifically,  a  hinge  loss  function  is  enforced  on 
each  data  point: 

£lsrc(*u  y»;  D)  =  max(0,  rf  -  r f  +  b),  (6) 


where 


Ci  =  arg  min  rf 
ce{l,...,C}\yt 


(7) 


is  the  most  competitive  class  in  reconstructing  the  signal  ex¬ 
cluding  the  true  class  y;.  6  is  a  non-negative  parameter  con¬ 
trolling  the  “margin”  between  the  classes.  The  loss  function 
in  Eq.  (6)  is  zero  when  the  residual  of  true  class  is  smaller 
than  any  other  class  by  at  least  an  amount  of  b.  Otherwise,  it 
gives  a  penalty  proportional  to  the  residual  difference  between 
the  true  class  and  the  most  competitive  “imposter”  class.  Intu¬ 
itively,  this  loss  function  is  also  related  to  the  misclassification 
rate  of  SRC.  Thus,  we  can  formulate  the  problem  of  LSRC  as: 


D*  =  arg  min  — 
Dec  N 


y  £z,Sflc(xj,  j a;  D). 


(8) 


3.  DICTIONARY  OPTIMIZATION 

Since  the  sample  size  N  is  usually  large,  stochastic  gradien- 
t  descent  methods  are  favored  to  optimize  a  dictionary  on¬ 
line  when  the  objective  function  is  an  expectation  over  all  the 
training  samples  [12].  The  dictionary  is  first  initialized  with 
a  reasonable  guess  D°  (through  K-means  or  an  unsupervised 
training  for  each  class),  and  then  it  is  updated  iteratively  by 
going  through  the  whole  data  set  multiple  epochs  until  conver¬ 
gence.  In  the  f-th  iteration,  a  single  sample  (xi;  7/j)1  is  drawn 
from  the  data  set  randomly  and  the  dictionary  is  updated  in 
the  gradient  direction  of  its  cost  term: 

Dt  =  Dt-1-ptVD£LSi?c(x,y;Df-1),  (9) 

1  For  simplicity,  we  drop  all  the  data  indices  i  hereafter. 
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where  pl  =  ,  p  =  is  the  step  size  at  iteration  t  with 

^  yj  (t—l)/N +1  F 

initial  value  p° .  The  gradient  of  hinge  loss  is 
VD£(x,p;D)  =  VD/’y-VD^,  if  —  r£  +  6  >  0,  (10) 


and  zero  or  undefined  otherwise.  We  can  ignore  the  case  of 
undefined  gradient,  because  it  occurs  with  very  low  probabil¬ 
ity  in  practice  (only  when  rv  —  rc  +  b  =  0)  and  thus  will  not 
affect  the  convergence  of  stochastic  gradient  descent  as  long 
as  a  suitable  step  size  is  chosen  [25]. 

To  evaluate  the  gradient  of  rc  for  a  particular  class  c,  we 
first  find  its  derivative  with  respect  to  the  (i,  j)-th  element  of 
D  as 


drc 
ddi  ; 


=  — 2e 


cT 


0DP  ca 
dda 


=  -2e 


cT 


Pc{j,j)ajUi  +  DP, 


da 

ddij 


(ID 


where  Pc  is  a  n  x  n  diagonal  matrix  with  1  at  positions  cor¬ 
responding  to  class  c  and  0  otherwise,  and  u,  is  a  m  x  1  unit 
column  vector  with  the  i-th  element  equal  to  1 . 

The  sparse  code  a  is  an  implicit  function  of  D.  and  it  has 
been  shown  differentiable  [20,  26]  with  respect  to  any  dictio¬ 
nary  atom  d,  with  index  j  in  the  active  set  A  =  {j\ctj  ^  0}. 
For  the  other  atoms,  the  gradient  is  zero  with  overwhelming 
probability  and  thus  can  be  ignored  for  the  same  reason  men¬ 
tioned  above.  Directly  using  the  result  given  in  [26],  we  can 
find  the  sparse  code  derivative  as: 


daA  _  ,-^[Da(Da«a  ~x)] 

<9Da  5Da  ’  (  } 

where  aA  and  DA  denote  the  sparse  coefficients  and  dictio¬ 
nary  columns  corresponding  to  the  active  set  A.  A  =  DADA, 
and  in  practice  we  set  A  =  D ADA  +  e  ■  I  to  ensure  the  stabil¬ 
ity  of  the  inverse  of  A,  where  e  is  a  small  positive  constant.  It 
is  then  easy  to  obtain  for  any  j  6  A: 


da 

ddij 


=  PA[A_1(:,  A-1^))  •  e;  -  A^Dfti, :)  •  aj],  (13) 


where  PA  6  R"XIAI,  PA(j,  k )  =  I(j  =  A (fc)),  A (k)  denotes 
the  k- th  element  of  A  sorted  in  ascending  order,  A_1(-)  is  the 
inverse  function  of  A(-),  and  e  =  x  —  Da  =  x  —  DAaA. 

Combining  all  the  equations  above  and  after  some  manip¬ 
ulations,  we  get  the  gradient  of  rc  with  respect  to  the  j-th 
dictionary  atom  for  any  j  6  A: 


Vdjrc  =  -2 ajl(cls(j)  =  c) -ec - 2/?£_i(j)  •  e  +  2<x,DA/3c, 

(14) 

where  cls(j)  is  the  class  label  for  j- th  dictionary  atom,  and 
/3C  =  A^1PAPcDTec.  Thus,  the  update  for  each  atom  in 
the  active  set  A  is: 


Ad‘  =  d‘  -  d'-  < 

=  V  ^ajI(cls(j)=y)-ev-ajI(cls(j)=c)-ec 
+(Pl-1U)-Pi-Hj)ye-ajnA(/3v- /3a)].(15) 


Algorithm  1  Dictionary  learning  with  LSRC 
Require:  labeled  data  set  S  =  {x;,  jp},  sparse  regularization 
coefficient  A,  margin  b 
Ensure:  dictionary  D 
1:  initialize  D 
2:  set  t  =  1 

3:  while  not  converge  do 
4:  randomly  permute  data  set  S 

5:  for  each  (x,  y)  e  S  do 

6:  find  sparse  code  a  with  Eq.  (1) 

7:  find  rc  =  ||x  —  Dcac||2  for  any  c  =  1...C 

8:  find  c  with  Eq.  (7) 

9:  if  rv  —  rc  +  b  >  0  then 

10:  d;  <—  dj  +  Ad,  for  any  j  6  A  by  Eq.  (15) 

11:  d;-  <—  dj /|  |dj  1 1  for  any  j  6  A 

12:  end  if 

13:  t  <r-  t  +  1 

14:  end  for 

15:  end  while 

16:  return  D 


The  resultant  dictionary  atoms  are  projected  to  unit  length  to 
ensure  DeD.  The  overall  method  of  LSRC  is  summarized 
in  Algorithm  1.  The  first  two  terms  in  Eq.  (15)  have  the  ef¬ 
fects  of  “pulling”  the  active  dictionary  atoms  of  correct  class 
towards  the  signal,  and  “pushing”  the  active  dictionary  atom- 
s  of  the  most  competitive  wrong  class  away  from  the  signal, 
which  is  similar  to  what  has  been  done  in  [23]  to  mimic  the 
procedure  used  in  the  LVQ.  The  third  and  fourth  terms  in  Eq. 
(15)  are  unique  in  our  LSRC  method.  They  bring  the  overall 
reconstruction  error  and  every  active  atom  as  ingredients  for 
dictionary  updating,  which  makes  sense  as  the  sparse  code  is 
jointly  determined  by  all  the  atoms  in  the  active  set. 

4.  EXPERIMENTAL  RESULTS 

We  test  the  proposed  method  on  three  benchmark  HSI  images: 
the  Indian  Pines  [27],  the  University  of  Pavia,  and  the  Center 
of  Pavia  [28].  The  experiments  setup  and  classification  ac¬ 
curacies  are  listed  in  Table  1.  We  compare  the  performance 
of  SRC  with  dictionaries  obtained  from  the  full  training  set 
(“Full”)  [5],  the  K-means  clustering  (“K-means”),  the  unsu¬ 
pervised  training  (“Unsup  ”)  [12]  2,  the  ad-hoc  LVQ  approach 
(“LVQ”)  [23],  and  our  method  (“LSRC").  Accuracies  are  also 
reported  for  the  SVM  classifiers  with  a  linear  kernel  (“SVM”) 
and  an  RBF-kernel  (“KSVM”),  the  later  of  which  is  known  to 
give  the  state-of-the-art  results  on  high  dimensional  HSI  data 
[2].  We  follow  the  same  way  as  in  [5]  in  pre-processing  the 
multi-band  features.  Since  our  focus  is  dictionary  learning, 
all  the  results  shown  are  based  on  pixel-wise  classification. 
Our  learned  dictionaries  have  a  small  size  of  only  5  atoms  per 

-Our  dictionary  is  not  as  good  as  the  one  learned  in  [12J  in  terms  of  sparse 
reconstruction,  but  it  gives  more  discriminative  sparse  codes  for  SRC. 
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Table  1.  Experiment  settings  and  classification  accuracies  (%)  on  three  HSI  images. 


Image 

#class 

#train/test 

Parameters 

Metric 

Full 

K-means 

Unsup 

LVQ 

LSRC 

SVM 

KSVM 

Indian 

1043/ 

120  iterations, 

OA 

82.96 

69.87 

66.41 

75.39 

83.84 

74.44 

84.52 

Pines 

16 

p°  =  0.01, 

AA 

76.66 

72.11 

67.59 

73.97 

77.69 

65.49 

79.24 

(200  bands) 

9323 

A  =  0.1,  b  =  0.2 

Hi 

0.805 

0.662 

0.624 

0.723 

0.816 

0.708 

0.823 

University 

3921  / 

10  iterations. 

OA 

78.31 

68.01 

64.57 

73.24 

81.08 

67.28 

79.15 

of  Pavia 

9 

p°  =  0.001, 

AA 

86.78 

77.05 

71.66 

82.91 

85.26 

79.66 

87.66 

(103  bands) 

40002 

A  =  0.05,  6  =  0.3 

Hi 

0.726 

0.596 

0.549 

0.666 

0.754 

0.599 

0.737 

Center  of 

5536/ 

20  iterations, 

OA 

97.45 

95.86 

95.91 

96.85 

97.93 

95.68 

96.13 

Pavia 

9 

p°  =  0.001, 

AA 

95.41 

91.35 

91.95 

93.93 

96.11 

93.77 

85.29 

(102  bands) 

97940 

A  =  0.1,  6  =  0.3 

Hi 

0.954 

0.925 

0.926 

0.943 

0.962 

0.923 

0.928 

iteration 


Fig.  1.  Accuracies  change  during  training  iterations  for  both 
“LVQ”  and  “LSRC”  on  the  Indian  Pines  image. 


class  -  a  great  reduction  compared  with  the  full  training  set 
used  in  the  models  of  “Full”  and  “KSVM”,  yet  the  overall  ac¬ 
curacy  (OA),  class-averaged  accuracy  (AA)  and  k  coefficient 
[29]  achieved  by  LSRC  are  higher  than  using  the  “Full”  set 
and  other  dictionary  learning  methods.  Although  built  on  lin¬ 
ear  input  space,  our  method  attains  better  performance  than 
the  nonlinear  “KSVM”  except  for  the  small  Indian  Pines  data 
set,  on  which  SVM  shows  a  better  generalization  capability. 

Fig.  1  demonstrates  that  our  learning  algorithm  effective¬ 
ly  reduces  both  training  and  test  errors  during  training,  and 
converges  to  much  higher  accuracies  than  “LVQ”.  The  label- 
s  of  the  Indian  Pines  image  predicted  using  the  “LVQ”  and 
“LSRC”  methods  are  also  given  in  Fig.  2  for  comparison. 

The  effect  of  tuning  margin  parameter  b  is  examined  in 
Table  2.  A  too  small  value  of  b  leads  to  over-fitting  to  train¬ 
ing  set,  while  a  too  large  value  leads  to  bias  of  classification 
objective.  A  proper  value  of  b  is  determined  using  part  of 
training  data  as  a  validation  set. 

5.  RELATION  TO  PRIOR  WORK 

The  work  presented  here  follows  the  classical  framework  of 
SRC  proposed  by  Wright  et  al  [4],  and  focuses  on  the  less 
investigated  problem  of  learning  a  dictionary  well  suited  for 


(a)  “LVQ”  (b)  “LSRC” 


Fig.  2.  Classification  results  on  the  Indian  Pines  image.  Color 
encodes  true  labels,  and  black  dots  denote  misclassification. 


Table  2.  Effect  of  parameter  b  on  the  Indian  Pines  image. 


b 

0.0 

0.1 

0.2 

0.3 

0.4 

Train  Acc.  (%) 

99.81 

99.14 

98.85 

98.27 

97.60 

Test  Acc.  (%) 

81.30 

83.51 

83.84 

83.19 

82.88 

SRC  on  HSI  data.  We  take  advantage  of  the  underlying  prin¬ 
ciple  of  Kohonen’s  LVQ  [22]  algorithm  and  adapt  it  to  the 
dictionary  design  for  SRC,  leading  to  a  novel  LSRC  algo¬ 
rithm  which  is  more  sound  theoretically  and  more  effective 
experimentally  than  the  ad-hoc  combination  done  previously 
by  Chen  et  al  [23]. 


6.  CONCLUSION 

A  new  dictionary  design  method  for  HSI  classification  is  pro¬ 
posed  by  optimizing  a  hinge  loss  function  sharing  the  same 
spirit  with  LVQ.  Our  stochastic  gradient  decent-based  algo¬ 
rithm  mimics  the  updating  rule  of  LVQ,  but  performs  sub¬ 
stantially  better  than  the  ad-hoc  adaptation  of  LVQ  as  well 
as  other  existing  dictionary  learning  approaches.  Classifica¬ 
tion  results  achieved  with  the  obtained  compact  dictionaries 
on  three  HSI  images  are  comparable  to  or  better  than  the  ker¬ 
nel  SVM-based  classifier.  In  future  work,  we  will  incorporate 
spatial  information  into  the  current  classification  framework 
and  apply  our  method  to  other  image  modalities. 
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